Search CORE

148 research outputs found

Non-linear Convolution Filters for CNN-based Learning

Author: Daras Petros
Doumanoglou Alexandros
Vretos Nicholas
Zoumpourlis Georgios
Publication venue
Publication date: 23/08/2017
Field of study

During the last years, Convolutional Neural Networks (CNNs) have achieved state-of-the-art performance in image classification. Their architectures have largely drawn inspiration by models of the primate visual system. However, while recent research results of neuroscience prove the existence of non-linear operations in the response of complex visual cells, little effort has been devoted to extend the convolution technique to non-linear forms. Typical convolutional layers are linear systems, hence their expressiveness is limited. To overcome this, various non-linearities have been used as activation functions inside CNNs, while also many pooling strategies have been applied. We address the issue of developing a convolution method in the context of a computational model of the visual cortex, exploring quadratic forms through the Volterra kernels. Such forms, constituting a more rich function space, are used as approximations of the response profile of visual cells. Our proposed second-order convolution is tested on CIFAR-10 and CIFAR-100. We show that a network which combines linear and non-linear filters in its convolutional layers, can outperform networks that use standard linear filters with the same architecture, yielding results competitive with the state-of-the-art on these datasets.Comment: 9 pages, 5 figures, code link, ICCV 201

arXiv.org e-Print Archive

Crossref

Skeleton-based Human Action Recognition using Basis Vectors

Author: Petros Daras
Stylianos Asteriadis
Publication venue
Publication date: 01/01/2015
Field of study

Automatic human action recognition is a research topic that has attracted significant attention lately, mainly due to the advancements in sensing technologies and the improvements in computational systems’ power. However, complexity in human movements, input devices’ noise and person-specific pattern variability impose a series of challenges that still remain to be overcome. In the proposed work, a novel human action recognition method using Microsoft Kinect depth sensing technology is presented for handling the above mentioned issues. Each action is represented as a basis vector and spectral analysis is performed on an affinity matrix of new action feature vectors. Using simple kernel regressors for computing the affinity matrix, complexity is reduced and robust low-dimensional representations are achieved. The proposed scheme loosens action detection accuracy demands, while it can be extended for accommodating multiple modalities, in a dynamic fashion

Maastricht University Research Portal

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Deep Affordance-grounded Sensorimotor Object Recognition

Author: Daras Petros
Papadopoulos Georgios Th.
Potamianos Gerasimos
Thermos Spyridon
Publication venue
Publication date: 10/04/2017
Field of study

It is well-established by cognitive neuroscience that human perception of objects constitutes a complex process, where object appearance information is combined with evidence about the so-called object "affordances", namely the types of actions that humans typically perform when interacting with them. This fact has recently motivated the "sensorimotor" approach to the challenging task of automatic object recognition, where both information sources are fused to improve robustness. In this work, the aforementioned paradigm is adopted, surpassing current limitations of sensorimotor object recognition research. Specifically, the deep learning paradigm is introduced to the problem for the first time, developing a number of novel neuro-biologically and neuro-physiologically inspired architectures that utilize state-of-the-art neural networks for fusing the available information sources in multiple ways. The proposed methods are evaluated using a large RGB-D corpus, which is specifically collected for the task of sensorimotor object recognition and is made publicly available. Experimental results demonstrate the utility of affordance information to object recognition, achieving an up to 29% relative error reduction by its inclusion.Comment: 9 pages, 7 figures, dataset link included, accepted to CVPR 201

arXiv.org e-Print Archive

Crossref

Natural User Interfaces for Virtual Character Full Body and Facial Animation in Immersive Virtual Worlds

Author: Konstantinos Cornelis Apostolakis
Petros Daras
Publication venue
Publication date: 15/08/2015
Field of study

In recent years, networked virtual environments have steadily grown to become a frontier in social computing. Such virtual cyberspaces are usually accessed by multiple users through their 3D avatars. Recent scientific activity has resulted in the release of both hardware and software components that enable users at home to interact with their virtual persona through natural body and facial activity performance. Based on 3D computer graphics methods and vision-based motion tracking algorithms, these techniques aspire to reinforce the sense of autonomy and telepresence within the virtual world. In this paper we present two distinct frameworks for avatar animation through user natural motion input. We specifically target the full body avatar control case using a Kinect sensor via a simple, networked skeletal joint retargeting pipeline, as well as an intuitive user facial animation 3D reconstruction pipeline for rendering highly realistic user facial puppets. Furthermore, we present a common networked architecture to enable multiple remote clients to capture and render any number of 3D animated characters within a shared virtual environment

ZENODO

Comprehensive Comparison of Deep Learning Models for Lung and COVID-19 Lesion Segmentation in CT scans

Author: Bizopoulos Paschalis
Daras Petros
Vretos Nicholas
Publication venue
Publication date: 03/02/2021
Field of study

Recently there has been an explosion in the use of Deep Learning (DL) methods for medical image segmentation. However the field's reliability is hindered by the lack of a common base of reference for accuracy/performance evaluation and the fact that previous research uses different datasets for evaluation. In this paper, an extensive comparison of DL models for lung and COVID-19 lesion segmentation in Computerized Tomography (CT) scans is presented, which can also be used as a benchmark for testing medical image segmentation models. Four DL architectures (Unet, Linknet, FPN, PSPNet) are combined with 25 randomly initialized and pretrained encoders (variations of VGG, DenseNet, ResNet, ResNext, DPN, MobileNet, Xception, Inception-v4, EfficientNet), to construct 200 tested models. Three experimental setups are conducted for lung segmentation, lesion segmentation and lesion segmentation using the original lung masks. A public COVID-19 dataset with 100 CT scan images (80 for train, 20 for validation) is used for training/validation and a different public dataset consisting of 829 images from 9 CT scan volumes for testing. Multiple findings are provided including the best architecture-encoder models for each experiment as well as mean Dice results for each experiment, architecture and encoder independently. Finally, the upper bounds improvements when using lung masks as a preprocessing step or when using pretrained models are quantified. The source code and 600 pretrained models for the three experiments are provided, suitable for fine-tuning in experimental setups without GPU capabilities.Comment: 10 pages, 8 figures, 2 table

arXiv.org e-Print Archive

A Deep Learning Approach to Object Affordance Segmentation

Author: Daras Petros
Potamianos Gerasimos
Thermos Spyridon
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 18/04/2020
Field of study

Learning to understand and infer object functionalities is an important step towards robust visual intelligence. Significant research efforts have recently focused on segmenting the object parts that enable specific types of human-object interaction, the so-called "object affordances". However, most works treat it as a static semantic segmentation problem, focusing solely on object appearance and relying on strong supervision and object detection. In this paper, we propose a novel approach that exploits the spatio-temporal nature of human-object interaction for affordance segmentation. In particular, we design an autoencoder that is trained using ground-truth labels of only the last frame of the sequence, and is able to infer pixel-wise affordance labels in both videos and static images. Our model surpasses the need for object labels and bounding boxes by using a soft-attention mechanism that enables the implicit localization of the interaction hotspot. For evaluation purposes, we introduce the SOR3D-AFF corpus, which consists of human-object interaction sequences and supports 9 types of affordances in terms of pixel-wise annotation, covering typical manipulations of tool-like objects. We show that our model achieves competitive results compared to strongly supervised methods on SOR3D-AFF, while being able to predict affordances for similar unseen objects in two affordance image-only datasets.Comment: 5 pages, 4 figures, ICASSP 202

arXiv.org e-Print Archive

Crossref